Cream of the Crop 1

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 1 / Cream of the Crop 1.iso / BUSINESS / STATHELP.ARJ / WATSTAT.TXT < prev next >

Wrap

Text File | 1991-12-26 | 72KB | 1,061 lines

*DISCLAIM,A IMPORTANT: Always consider WATSTAT's recommendations as a STARTING POINT and NOT THE FINAL WORD: they are merely intended to serve as guides to further study and consultation. WATSTAT can only recommend what is USUALLY appropriate, given the specifications you provide. Other unspecified factors my over- ride those that WATSTAT considers. Moreover, it would be unwise to ignore such "non-statistical" factors as: what procedures make the most theoretical sense; what procedures are established and expected in your field; and what procedures you and your readers will be able to interpret. *RAND,A NOTE: Since you specified Random Sampling or Random Assignment, it is legitimate to use INFERENTIAL STATISTICS (Significance Tests & Confidence Limits) as well as DESCRIPTIVE STATISTICS. But when you use Inferential statistics, you must still report important Descriptive statistics, such as means & standard deviations, percentages, or correlation coefficients. *NONRAND,A NOTE: Since you have a non-random sample, NO INFERENTIAL STATISTICS (such as Significance Tests or Confidence limits) are appropriate. Hence, WATSTAT will recommend only DESCRIPTIVE STATISTICS. *WHAT_DES,A Report all Descriptive statistics needed to characterize your sample (e.g., demographics) and, depending upon your analytical focus, report those that most clearly show: 1) the magnitude of sub-sample differences; 2) the strength & direction of associations; or 3) the characteristics of a single variable's distribution, e.g., its "average," "dispersion," and "shape." In deciding what Descriptive statistics to report, ask yourself: "What information will a reader need to REPLICATE my analysis or to COMPARE my results to those of others?" *D-UNI-NOM,A Summarize the distribution with a percentage table and point out the Modal and sparse categories. Optionally, present percentages graphically in a bar or pie chart. *D-NOM-SMALL,A CAUTION: Due to your small sample size, each case counts for more than 1% and a seemingly large between-category % difference could be due to very few cases. Take this into account in deciding whether percentage differences reflect important substantive differences in the cases you're describing. *D-UNI-RANK,A If your data are inherently in the form of ranks, sample size determines all the key descriptive statistics and there is no need to report them. You should report the number of ties and the ranks on which most ties occur. If you have an Ordinal variable (not originally in ranks) the Median is the appropriate "average" and the Quartile Deviation the appropriate index of "dispersion." Usually, it is also appropriate to report some additional Percentiles to give a more complete picture of the variable's distribution, for example, the 25th & 75th Percentiles, or the upper and lower Deciles. *D-UNI-PART,A If your Ordinal categories allow, compute the Median and Quartile Devia- tion to index the "average" and "degree of dispersion," respectively. If data are inherently grouped and if it is inappropriate to compute the Median exactly, report the category it falls in and its approximate location in the category. Summarize the distribution with a percentage table and point out the Modal and sparse categories. Optionally, present percentages graphically in a bar or pie chart. *D-UNI-INT,A If your data are dichotomized, report the cut-point that divides the categories and the percentage (or proportion) of cases in each category. If your data are continuous or grouped into 3 or more categories, use the Mean and Standard Deviation to index the "average" and "dispersion" of the distribution. If the distribution is highly skewed or if there are some extreme values that could make the Mean a "misleading average," report the Median instead of, or in addition to, the Mean. Whether or not the data are skewed, it is usually wise to report some key Percentiles to provide a more complete picture of the distribution, for example, the 25th & 75th Percent- iles, or the upper and lower Deciles. If the data are grouped, a Percentage Table or equivalent graphic (e.g., a bar chart) is usually appropriate. If you don't use a percentage table with grouped data, consider reporting where the Mode falls and which, if any, categories are exceptionally sparse. If the data are continuous and if it is important to describe the shape of the distribution, consider grouping the data and using procedures noted in the preceding paragraph. Alternatively, you could present the data in a Frequency Polygon (line chart) or in an Ogive (a line chart that shows the cumulative frequency distribution). *D-COMP1-NOM,A Percentage tables are usually the best for comparing Nominal distribu- tions across sub-samples. Use Percentage Differences to index the magnitude of sub-sample differences, and point out the Modal and sparse categories for each sub-sample. Optionally, present percentages graphically in bar charts. *D-COMP2-NOM,A Percentage tables are usually the best for comparing Nominal distribu- tions across sub-samples. Use Percentage Differences to index the magnitude of sub-sample differences, and point out the Modal and sparse categories for each sub-sample. Multivariate percentage tables are appropriate for showing differences across two or more Independent (Comparison) variables, especial- ly when there are important Interaction (Specification) effects. However, such tables are more difficult to read, so it is usually advisable to break them into a set of bivariate Partial Tables. Standardized Percentage Tables can be used to adjust for one or more Comparison variables without showing them directly in the tables, but standardization can only be used for Com- parison variables that do not Interact with others. As an alternative to tables, consider presenting percentages graphically in bar charts. *D-COMP-RANK,A If your Dependent variable is inherently in the form of ranks, your best option is probably to compare Mean Ranks across sub-samples. However, keep in mind that Mean Ranks are not the same as means computed on Interval data, so the absolute size of sub-sample differences is not meaningful: focus only on "greater-than" and "less-than" relationships between Mean Ranks of your sub-samples. Unless ties are rare, report the number of ties and the ranks on which most ties occur. If your Ordinal Dependent variable is not ranked, the Median is the appropriate "average" and the Quartile Deviation the appropriate index of "dispersion." Compare Medians across sub-samples, and search for possible "interaction effects" between Comparison variables. Focus on the RELATIVE SIZE of sub-sample Medians (i.e., "greater-than" & "less-than" relations), because the absolute magnitude of Ordinal-scale Medians is not meaningful. Usually, it is also appropriate to report some additional Percentiles (e.g., the 25th & 75th Percentiles or the highest & lowest Deciles) to give a more complete picture of each sub-sample distribution. *D-COMP-PART,A The best way to assess differences on a "Partially Ordered" variable depends on whether you're able to compute sub-sample Medians. If your data allow you to determine Medians exactly, report the Medians for all sub-samples and focus on the RELATIVE SIZE of sub-sample Medians (i.e., "greater-than" & "less-than" relations), since the absolute magnitude of Ordinal-scale Medians is not meaningful. If you have two or more Compar- ison Variables, search for possible "interactions" between these variables. If the grouping of data doesn't allow you to compute Medians, you won't be able to compare sub-sample "averages" in a way that takes full advantage of the Dependent variable's Ordinal properties. The best approach in this case is to present the data in Percentage Tables, which assume only Nominal measurement. (Optionally, present percentages graphically in bar charts.) Use % Differences to index the magnitude of sub-sample differences and point out the Modal and sparse categories for each sub-sample. Since you should be able to specify the CATEGORIES THAT CONTAIN THE MEDIAN for the various sub-samples, you can also base comparisons on the APPROXIMATE location of Medians; since categories are ordered, you should also be able to interpret an approximate difference in Medians as evidence that one sub-sample has a higher "average" than another. *D-COMP1-INT,A With Interval Dependent Variables it is usually appropriate to base sub-sample comparisons on Means. Report all sub-sample Means and Standard Deviations. *D-COMP2-INT,A If you have two or more Comparison Variables, search for possible inter- actions. If you have one or more Interval-Level Independent variables that you wish to control ("hold constant"), Analysis of Covariance procedures can be used to adjust sub-sample Means for such variables. *D-COMP-DICH,A Percentage tables are usually best for comparing Dichotomous Dependent variables across sub-samples, but it may be appropriate to use Rates or Proportions rather than %'s, especially if the Dependent variable represents a relatively rare occurrence, such as a disease or mortality outcome. [Note that Rates & Proportions may be analyzed and tabulated in much the same way as Percentages, although they are expressed on different scales.] Use % Differences [or Rate or Proportion Differences] to index the magni- tude of sub-sample differences, and point out the Modal and sparse catego- ries for the various sub-samples. Multivariate tables are appropriate for showing differences across two or more Independent (Comparison) variables, especially when important Interaction (Specification) effects are present. However, such tables are more difficult to read, so it may be advisable to break them into a set of bivariate Partial Tables. "Standardized Partial Percentage Tables" can be used to adjust for one or more Independent vari- ables without showing them directly in the tables, but standardization can only be used for Independent variables that do not Interact with others. Instead of tables, consider presenting Percentages [or Rates or Proportions] in graphic charts. *D-COMP-OTHER2,A Except for Interval Dependent Variables, there is no procedure designed to handle simultaneous sub-sample comparisons for 2 or more Dependent vari- ables. Your only option is to run a separate analysis for each Dependent variable. To get recommendations appropriate for these separate analyses, return to WATSTAT's Choice Boxes and select an Option other than "2 or More Dependent Variables" in Box 4. *D-BIVAR-NOM/NOM,A If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-NOM/RANK,A There is no statistic specifically designed to measure the association between a Nominal Dependent variable and an Ordinal Independent variable. Your only choice is to break the Ordinal variable into categories and treat it as Nominal. If you dichotomize it, select a cut-point as close to the Median as possible; if you break it into 3 or more categories, select cut- points that yield approximately equal frequencies across categories. Once the Ordinal variable is categorized, the appropriate statistics are those for two Nominal variables. If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-NOM/PART,A There is no statistic specifically designed to measure the association between a Nominal Dependent variable and an Independent variable that is cast in the form of Ordinal categories. Your only choice is to treat the Ordinal variable as if it were a set of Nominal categories, and the only appropriate statistics are those for two Nominal variables. If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-NOM/INT,A There is no statistic specifically designed to measure the association between a Nominal Dependent variable and an Interval Independent variable, so you have two OPTIONS: 1) break the Interval variable into categories and treat it as Nominal, or 2) dichotomize the Dependent variable and treat it as Interval. If you choose OPTION 1, break the Independent variable into categories that contain approximately equal numbers of cases. Once this is done, the appropriate statistics are those for two Nominal variables. If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. If you choose OPTION 2, dichotomize the Dependent variable as close as possible to the Median unless there is theoretical justification for using another "high vs. low" cut-point. The dichotomized Dependent variable may now be assigned arbitrary scores of 0 for "low" and 1 for "high" and may, within limits, be treated as an Interval scale. Once this is done, you can use the Linear Correlation Coefficient (Pearson's r and r-squared) to index the strength and direction of the relationship. But if your problem calls for regression statistics, Linear Regression may not be appropriate: with a dichotomous Dependent variable some predicted (Y') scores may have impossi- ble values (less than 0 or greater than 1). If these impossible values are numerous or if they will cause problems in interpreting your results, use Logistic Regression instead. *D-BIVAR-RANK/NOM,A There is no statistic specifically designed to measure the association between an Ordinal Dependent variable and a Nominal Independent variable. Your only choice is to break the Ordinal variable into categories and treat it as Nominal. If you dichotomize it, select a cut-point as close to the Median as possible; if you break it into 3 or more categories, select cut- points that yield approximately equal frequencies across categories. Once the Ordinal variable is categorized, the appropriate statistics are those for two Nominal variables. If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-RANK/RANK,A If both variables are in the form of ranks, you can proceed to compute one of the measures of association noted below. Otherwise, you must transform them to ranks before proceeding. Spearman's Rho is the best known measure of association for two Ordinal variables and, because it is simply the Linear Correlation Coefficient (Pearson's r) applied to ranks, it is often interpreted as an approximate index of linear correlation. The "correction for ties" should be applied to Rho, but it has little effect if fewer than 30% of the cases are tied. In some fields the preferred statistic is Kendall's Tau, which, unlike Spearman's Rho, does not involve any arithmetical operations that assume an underlying Interval Scale. This statistic is sometimes referred to as "Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are applied to "ordered contingency tables." The computing formulas for Tau-A found in most texts incorporate a correction for tied ranks. *D-BIVAR-RANK/PART,A There is no statistic specifically designed to measure the association between a "true" Ordinal Dependent variable and a "partially ordered" ind- ependent variable. Your best choice is to break the Dependent variable into ordered categories and treat both variables as "partially ordered." Prior to computations, copy the data into a contingency table in which rows are categories of the Dependent variable and columns are categories of the Independent variable. Use one of the following measures of association: The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." *D-BIVAR-RANK/INT,A There is no statistic specifically designed to measure the association between an Ordinal Dependent variable and an Interval Independent variable. If you can't assume that the Dependent variable is Interval, you'll have to "downgrade" the Independent variable and treat it as an Ordinal scale. If you can transform it to ranks, do so, and apply one of the measures of association recommended below. [If it is so grouped that it can only be transformed into a set of ordered categories, go back thru WATSTAT's Choice Boxes and pick Option 3, "Ordered Categories," as the Level of Measurement for the Independent variable.] Spearman's Rho is the best known measure of association for two Ordinal variables and, because it is simply the Linear Correlation Coefficient (Pearson's r) applied to ranks, it is often interpreted as an approximate index of linear correlation. The "correction for ties" should be applied to Rho, but it has little effect if fewer than 30% of the cases are tied. In some fields the preferred statistic is Kendall's Tau, which, unlike Spearman's Rho, does not involve any arithmetical operations that assume an underlying Interval Scale. This statistic is sometimes referred to as "Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are applied to "ordered contingency tables." The computing formulas for Tau-A found in most texts incorporate a correction for tied ranks. *D-BIVAR-PART/NOM,A There is no statistic specifically designed to measure the association between a set of ordered categories and a Nominal Independent variable, and your only option is to "downgrade" the Dependent variable to the Nominal level. For two Nominal variables the following recommendations apply. If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-PART/RANK,A There is no statistic specifically designed to measure the association between a "partially ordered" Dependent variable and a "true" Ordinal ind- ependent variable. Your best choice is to break the Independent variable into ordered categories and treat both variables as "partially ordered." Prior to computations, copy the data into a contingency table in which rows are categories of the Dependent variable and columns are categories of the Independent variable. Use one of the following measures of association: The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." *D-BIVAR-PART/PART,A Prior to computations, copy the data into a contingency table in which rows are categories of the Dependent variable and columns are categories of the Independent variable. Use one of the following measures of association: The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." *D-BIVAR-PART/INT,A There is no statistic specifically designed to measure the association between a "partially ordered" Dependent variable and an Interval Independent variable. The best alternative is to break the Independent variable into ordered categories and treat both variables as "partially ordered." Prior to your computations, copy the data into a contingency table in which rows are categories of the Dependent variable and columns are categories of the Independent variable. Then use one of the following indices of association: The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." *D-BIVAR-INT/NOM,A The preferred measure of association for an Interval Dependent variable and a Nominal Independent variable is the Correlation Ratio (Eta). The Eta statistic indexes the strength of a relationship of any form, including non-monotonic (e.g., U-shaped). Eta-Squared is commonly reported instead of Eta, since it has a more meaningful interpretation: it measures the propor- tion of variance in the Dependent variable explained by the categories of the Independent variable. *D-BIVAR-INT/RANK,A There is no statistic specifically designed to measure the association between an Interval Dependent variable and an Ordinal Independent variable. If you can't assume that Independent variable is Interval, you'll have to "downgrade" the Dependent variable and treat it as an Ordinal scale. If you can transform it to ranks, do so, and apply one of the measures of association recommended below. [If it is so grouped that it can only be transformed into a set of ordered categories, go back thru WATSTAT's Choice Boxes and pick Option 3, "Ordered Categories," as the Level of Measurement for the Dependent variable.] Spearman's Rho is the best known measure of association for two Ordinal variables and, because it is simply the Linear Correlation Coefficient (Pearson's r) applied to ranks, it is often interpreted as an approximate index of linear correlation. The "correction for ties" should be applied to Rho, but it has little effect if fewer than 30% of the cases are tied. In some fields the preferred statistic is Kendall's Tau, which, unlike Spearman's Rho, does not involve any arithmetical operations that assume an underlying Interval Scale. This statistic is sometimes referred to as "Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are applied to "ordered contingency tables." The computing formulas for Tau-A found in most texts incorporate a correction for tied ranks. *D-BIVAR-INT/PART,A There is no statistic specifically designed to measure the association between an Interval Dependent variable and a "partially ordered" Independent variable, so you have 2 OPTIONS: 1) "downgrade" the Dependent variable by breaking it into ordered categories, or 2) "downgrade" the Independent vari- able to a Nominal scale. OPTION 2 is the best choice if you're interested mainly in the strength of the relationship, but since the Independent vari- able is assumed to be merely Nominal, you won't be unable to determine the direction (+/-) of the relationship. If you choose OPTION 1, you should break the Dependent variable into cat- egories that contain approximately equal numbers of cases. Copy the data into a contingency table in which rows are categories of the Dependent vari- able and columns are categories of the Independent variable. Then compute one of the following indices recommended for ordered contingency tables. The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." If you choose OPTION 2, every category of the Independent variable MUST contain at least 2 cases (preferably more), so you might have to collapse some sparse categories. However, categories should not be collapsed without restraint: it is also desirable to have as many categories as possible. The preferred measure of association for an Interval Dependent variable and a Nominal Independent variable is the Correlation Ratio (Eta). The Eta statistic indexes the strength of a relationship of any form, including non-monotonic (e.g., U-shaped). The square of the Eta (Eta-Squared) is commonly reported instead of Eta, since it has a more meaningful interpret- ation: it measures the proportion of variance in the Dependent variable explained by the categories of the Independent variable. *D-BIVAR-INT/INT,A In most situations the preferred index of association for two Interval variables is the Linear Correlation Coefficient, also called Pearson's r. The square of the r statistic, known as the Coefficient of Determination, is often reported along with r, because it measures the proportion of variance in one variable explained by the other. If you're interested in predicting or estimating scores on the Dependent variable from those on the Independent variable, you should compute the Linear Regression statistics: the Regression Coefficient, the Y-Intercept, and the Standard Error of Estimate. If you suspect that the relationship departs markedly from linearity, so that Pearson's r underestimates its "true" strength, you can use the Correl- ation Ratio (Eta) instead. This will require breaking the Independent vari- able into a set of categories, preferably in such a way that 5 or more cases fall in each category. Eta indexes the strength of a relationship of any form, including those which are non-monotonic (e.g., U-shaped). Eta-squared is commonly reported instead of Eta, because it has a more meaningful inter- pretation: it measures the proportion of variance in the Dependent variable explained by the categories of the Independent variable. *D-BIVAR-DICH/NOM,A Even if your dichotomous Dependent variable is Ordinal or Interval, it is probably best to treat it as Nominal, like your Independent variable, and use a measure of association for two Nominal variables. If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-DICH/RANK,A There is no statistic specifically designed to measure the association between a dichotomous Dependent variable and an Ordinal Independent vari- able. You'll first have to break the Independent variable into categories and then you'll have 2 OPTIONS: 1) assume the Dependent variable is Ordinal and use a measure of association for two "partially ordered" variables, or 2) assume that both variables are merely Nominal and use a measure for two Nominal variables. Option 1 is usually preferable, but choose Option 2 if it makes no sense to treat the dichotomous Dependent variable as Ordinal. If you choose Option 1, copy the data into an ordered contingency table and compute one of the following: The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." If you choose Option 2, copy the data into a contingency table, making no assumption about the order of rows & columns. Then use one of the following measures appropriate for two Nominal scales: If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-DICH/PART,A With a dichotomous Dependent variable and a "partially ordered" independ- ent variable, you have 2 OPTIONS: 1) assume the Dependent variable is also Ordinal and use a measure of association for two "partially ordered" vari- ables, or 2) assume the Independent variable is only Nominal and use a meas- ure of association for two Nominal variables. Option 1 is usually better. If you choose Option 1, copy the data into an ordered contingency table and compute one of the following: The best statistic for most ordered contingency tables is a modified form of Kendall's Tau: use Tau-B if the number of rows in the table equals the number of columns; use Tau-C if the table is not "square." If you choose Option 2, copy the data into a contingency table, making no assumption about the order of rows & columns. Then use one of the following measures appropriate for two Nominal scales: If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-BIVAR-DICH/INT,A With a dichotomous Dependent variable and an Interval Independent vari- able, you have 2 OPTIONS: 1) assume that the dichotomy is an Interval vari- able, or 2) "downgrade" the Independent variable to the Nominal level. For Option 1, which is usually preferable, you'd use a measure of association for two Interval variables. For Option 2, you'd first break the Independent variable into categories and use a measure of association for two Nominal variables. If you choose OPTION 1, assign arbitrary scores of 0 (low) and 1 (high) to categories of the Dependent variable. Then use the Linear Correlation Coefficient (Pearson's r and r-squared) to measure the strength and direc- tion (+/-) of the relationship. If you're mainly interested in predicting Dependent variable scores from those on the Independent variable, compute regression statistics (Regression Coefficient, Y-Intercept, & Standard Error of Estimate). But note that Linear Regression may not be appropriate: with a dichotomous Dependent variable, some scores predicted from the regression equation (Y'= A+bx) may have impossible values (i.e., less than 0 or greater than 1). If there are many impossible values or if they will cause problems in interpreting your results, use Logistic Regression instead. If you take OPTION 2, divide the Independent variable into categories that contain about the same number of cases and use one of the following: If the two Nominal variables are dichotomized, use the Phi Coefficient as a measure of association. If either or both of your Nominal variables has 3 or more categories, use Cramer's V, which is the same as Phi except that it adjusts for the number categories. *D-MUL-SMALL-INT,A WARNING: The SAMPLE SIZE you specified may be TOO SMALL to support the type of multivariate procedure(s) WATSTAT recommended. As a practical rule of thumb you should have a minimum of about 10 cases for each variable in such procedures. To meet this criterion you may have to drop some variables from the analysis. If you can't drop enough to approach the 10-case-per-variable criterion, you shouldn't use the above procedure(s). *D-MUL-SMALL-NOM,A WARNING: The SAMPLE SIZE you specified may be TOO SMALL to use Multivariate Procedures for Nominal Variables, of the sort recommended. Computations for such methods are based on cross-tabulations, and as the number of variables (& categories) increases, cell frequencies can become too sparse to support the analysis. You may need to drop some variables from the analysis and/or collapse variables into fewer categories. *D-MUL-1DEP-NOM/NOM,A The recommended procedure (and the only one available) for measuring the association between a Nominal-level Dependent and a set of Nominal independ- ent variables is Log-Linear Analysis. In most cases, this procedure will require the use of a computer and many popular statistical software packages can run it. A good deal of statistical sophistication is required to apply it and to interpret its results. Log-Linear Analysis may not be widely used in your field and, if not, the task of reporting your results will be some- what more difficult. The use of Log-linear Analysis is also limited by the substantial sample size it usually requires. However, no alternative procedure is applicable unless you're willing to dichotomize the Dependent variable (so it can be scored 0/1 and treated as Interval) and to transform all the Independent variables and also treat them as Interval. The latter step would involve either: 1) dichotomizing each Independent variable and assigning "0" & "1" scores to its categories; or 2) creating a set of "dummy variables" (each scored 0/1) to represent its categories. After these transformations, you can apply either Logistic Regression or Discriminant Analysis. For more info about these procedures, return to WATSTAT's Choice Boxes and specify "Dichotomous" for the depen- dent (Box 5) variable & "Interval" for the Independent (Box 6) variables. *D-MUL-1DEP-NOM/INT,A The only procedure designed to assess the association between a Nominal Dependent & a set of Interval Independent variables is Discriminant Analysis. This procedure does not produce a single index (analogous to a correlation coefficient), but instead yields a set of prediction equations, called "Discriminant Functions," the interpretation of which requires a good deal of statistical expertise. Computations must be done by computer and most statistical software packages include Discriminant Analysis routines. Interpretation of results is considerably simpler if the Dependent vari- able is dichotomized, but if this is done, Logistic Regression and Multiple Correlation/Regression would also be applicable and perhaps preferable. *D-MUL-1DEP-NOM/MIXIO,A There is no procedure available to measure association between a Nominal Dependent variable and Independent variables with "mixed" levels of measure- ment, so you'll need to transform one or more Independent variables to make them all either Nominal or Interval. In the former case, you'd simply break your Interval or Ordinal variables into categories and proceed as if they were Nominal. In the latter, you'd transform each Ordinal or Nominal inde- pendent variable to Interval by either: 1) dichotomizing it and assigning scores of "0" and "1" to its categories; or 2) breaking it into categories and creating a set of "dummy variables" (each scored 0/1) to represent its categories. If all Independent variables are Nominal, Log-Linear Analysis may be used. For more info about Log-Linear Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both the Dependent (Box 5) and the Independent (Box 6) variables. If all Independent variables are Interval (including dichotomies and dummy variables), you can use Discriminant Analysis. For more info about Discriminant Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" for the Dependent (Box 5) and "Interval" for the Independent (Box 6) variables. *D-MUL-1DEP-NOM/ORD,A There is no procedure available to measure association between a Nominal Dependent variable and Ordinal Independent variables. Your best alternative is to categorize the Ordinal variables and treat them as Nominal; then you can use Log-Linear Analysis. For more information on Log-Linear Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both the Dependent (Box 5) and the Independent (Box 6) variables. *D-MUL-1DEP-ORD/ALL,A There is no multivariate procedure designed to measure the association between an Ordinal Dependent variable and a set of 2 or more Independent variables. However, if you transform the Dependent variable (and perhaps the Independent variables) a number of alternatives may be applicable. You have 2 basic OPTIONS: 1) dichotomize the Dependent variable and treat it as Interval, or 2) break the Dependent variable into 2 or more categories and treat it as Nominal. OPTION 1 is preferable as long as it makes sense to dichotomize the Dependent variable. If you take OPTION 1, you can use either Multiple Regression/Correlation or Logistic Regression, BUT to do so all your Independent variables must also be Interval or Dichotomies (i.e., Nominal and Ordinal Independent vari- ables must be dichotomized or represented as sets of "dummy variables"). For more info about Multiple Regression/Correlation, return to WATSTAT's Choice Boxes and choose "Interval" measurement for both the Dependent vari- able (Box 5) and the Independent (Box 6) variable. For more information on Logistic Regression, specify "Dichotomy" (Box 5) and "Interval" (Box 6). With OPTION 2, you can use either Discriminant Analysis or Log-Linear Analysis. To use Discriminant Analysis, all Independent variables must be Interval (i.e., Nominal & Ordinal Independent variables must be dichotomized or represented as sets of "dummy variables"). With Log-Linear Analysis, all Independent variables must be Nominal (i.e., Ordinal & Interval variables must be represented as sets of 2 or more Nominal categories). For more info about Discriminant Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" for the Dependent (Box 5) and "Interval" for the Independent variables. For more info about Log-Linear Analysis, specify "Nominal" for both Dependent (Box 5) and Independent (Box 6) variables. *D-MUL-1DEP-INT/INT,A If your Dependent variable is Interval and all your Independent variables are also Interval (or dichotomies) your best choice is Multiple Regression/ Correlation. Use the Multiple Correlation statistics (R and R-Squared) to index the strength of the relation between the Dependent variable and all the Independent variables jointly. Use the Regression Coefficients (b) to index the effect of each Independent variable and use the Standard Error of Estimate to index the precision with which the set of Independent vari- ables predict (estimate) scores on the Dependent variable. *D-MUL-1DEP-INT/OTHER,A There is no multivariate procedure designed to relate an Interval depend- ent variable with Nominal or Ordinal Independent variables. However, after some simple transformations, you can treat Nominal and Ordinal variables as if they were Interval and use Multiple Correlation/Regression procedures. Dichotomous Independent variables (scored 1/0) can be treated as Interval in these procedures and you can dichotomize whenever it makes sense to treat a Nominal variable as "present" vs. "absent" (1 vs. 0) or an Ordinal vari- able as "high" vs. "low" (1 vs. 0). However, it is often desirable to pre- serve a more detailed representation of Nominal & Ordinal variables: this can be done by dividing them into categories and using a SET of dichotomous variables, called "dummy variables," to represent the categories. Use the Multiple Correlation statistics (R and R-Squared) to index the strength of the relation between the Dependent variable and all the indepen- dent variables operating jointly. Use the Regression Coefficients (b-values) to index the effect of each Independent variable and use the Standard Error of Estimate to index the precision with which the set of Independent vari- ables predicts (estimates) scores on the Dependent variable. *D-MUL-1DEP-DICH/NOM,A Log-Linear Analysis is specifically designed to assess association between a Nominal Dependent variable and a set of Nominal Independent vari- ables. The fact that your Dependent variable is dichotomous presents no problems, as long as it makes sense to treat it as a Nominal variable. *D-MUL-1DEP-DICH/ORD,A There is no procedure designed to measure association between a dichoto- mous Dependent variable and Ordinal Independent variables. Your best alter- native is to categorize the Ordinal variables and treat them as Nominal; then you can use Log-Linear Analysis. For more information about Log-Linear Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both Dependent (Box 5) and Independent (Box 6) variables. *D-MUL-1DEP-DICH/INT,A Several multivariate procedures are potentially applicable if the depen- dent variable is a dichotomy and all the Independent variables are Interval. In order of preference, the available options include: Logistic Regression, Discriminant Analysis, & Multiple Correlation/Regression. Logistic Regress- ion is almost certain to be applicable. Discriminant Analysis is a good alternative when category frequencies on the Dependent variable approach a 50%/50% split, but should not be used when the split is more extreme than 80%/20%. Multiple Correlation/Regression is less generally applicable when the Dependent variable is a dichotomy: although the Dependent variable is scored 0 and 1 (for "low" & "high") some predicted (Y') scores may attain impossible values (less than 0 or greater than 1). If there are many impos- sible values, or if such values will cause problems in interpreting your results, Multiple (Linear) Correlation/Regression should NOT be used. *D-MUL-1DEP-DICH/MIXON,A There is no procedure designed to measure association between a dichoto- mous Dependent variable and "mixed" Ordinal/Nominal Independent variables. Your best alternative is to categorize the Ordinal variables and treat them as Nominal; then you can use Log-Linear Analysis, which assumes that all the Independent variables are Nominal. For more info about Log-Linear Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both Dependent (Box 5) and Independent (Box 6) variables. *D-MUL-1DEP-DICH/MIXIO,A There is no procedure designed to measure association between a dichoto- mous Dependent variable and Independent variables with "mixed" measurement levels, so you'll need to transform one or more Independent variables to make them ALL either Nominal or Interval. In the former case, you'd simply break any Interval or Ordinal variables into categories and proceed as if they were Nominal. In the latter, you'd transform each Ordinal or Nominal Independent variable to Interval by either: 1) dichotomizing it and assign- ing scores of "0" and "1" to its categories; or 2) breaking it into catego- ries and creating a set of "dummy variables" (each scored 0/1) to represent the categories. If all Independent variables can be treated as Nominal, you can use Log-Linear Analysis. For more info about Log-Linear Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both Dependent (Box 5) and Independent (Box 6) variables. If all Independent variables are Interval (including dichotomies and dummy variables), you can use Logistic Regression or Discriminant Analysis. For more info about these procedures, return to WATSTAT's Choice Boxes and specify "Dichotomy" for the Dependent (Box 5) variable and "Interval" for the Independent (Box 6) variables. *D-MUL-2DEP-INT/INT,A Several multivariate procedures are potentially applicable when all your variables are Interval and you're dealing with 2 or more Dependent variables simultaneously. They include: Canonical Correlation; measures of association derived from MANOVA; and various Structural Equation Modelling procedures, e.g., LISREL and EQS. All these assume advanced statistical training and must be performed by computer. Moreover, so much additional information is needed to choose from these alternatives that WATSTAT cannot recommend a "best" procedure here. *D-MUL-2DEP-INT/NOTINT,A Several multivariate procedures are potentially applicable when you're dealing with 2 or more Dependent variables simultaneously. They include: Canonical Correlation, measures of association derived from MANOVA, and various procedures for Structural Equation Modelling (e.g., LISREL and EQS). However, all require advanced statistical training and must be performed by computer. Further, all assume Interval measurement for ALL variables, so you won't be able to use them unless you drop "lower-level" variables or transform them to sets of dummy variables. Finally, so much additional information is needed to choose from these alternatives that WATSTAT can't recommend a "best" procedure here. *D-MUL-2DEP-NOTINT,A Several multivariate procedures are potentially applicable when you're dealing with 2 or more Dependent variables simultaneously. They include: Canonical Correlation, measures of association derived from MANOVA, and various procedures for Structural Equation Modelling (e.g., LISREL and EQS). However, all require advanced statistical training and must be performed by computer. Further, all assume Interval measurement for ALL variables in the analysis, so you probably won't be able to use them. Finally, so much addi- tional information is needed to choose from these alternatives that WATSTAT can't recommend a "best" procedure here. *D-MUL-NODEP-INT,A Factor Analysis is recommended for assessing relationships among several Interval-level variables when there is no Dependent variable identified. [Dichotomous variables, scored 0/1, may also be Factor Analyzed.] There are many types of Factor Analysis and selecting the appropriate type is too complicated for WATSTAT to handle: you'll need to consult a specialized text on Factor Analysis. Computations require a computer, and most popular statistical packages offer a variety of Factor Analysis proce- dures. [The manuals for some of these packages are good sources of advice on which type of Factor Analysis to apply.] *D-MUL-NODEP-RANK,A Kendall's Coefficient of Concordance (Kendall's W) is designed to assess relationships among 3 or more Ordinal variables when there is no Dependent variable identified. All variables must be transformed to RANKS if they are not inherently in rank form. The interpretation of Kendall's W is facili- tated by its linear relationship to "Average Rho," i.e., the mean rank-order correlation (Spearman' Rho) between all possible pairs of variables. *D-MUL-NODEP-NOTINT,A Factor Analysis is the only widely-used procedure designed to assess relationships among several variables when there is no Dependent variable identified. Unfortunately, this procedure assumes that all variables are Interval, so you can't use it for your "lower level" variables. However, dichotomies (scored 0/1) may be treated as Interval here, so if you can dichotomize your "lower level" variables, you can apply Factor Analysis. *S-UNI-NOM,A Assuming only Nominal Measurement, the Chi-Square Goodness-of-Fit Test may be used to test whether it's likely that your RANDOM SAMPLE came from a POPULATION with an hypothesized proportion of cases in its various catego- ries. You specify the Population proportions (P) in the Null Hypothesis and multiply each P by Sample Size to obtain EXPECTED FREQUENCIES for the test. Within limits, you may specify any set of P's derived from theory or prior knowledge of a relevant population. If your variable is Dichotomous, the Binomial Test is preferable to the Chi-Square Goodness-of-Fit, especially when sample size is small. Use Exact Binomial Tables for small sample sizes and the Normal Approximation (z-Test) for larger (>25) samples. *S-UNI-RANK,A In the special situation where "scores" or Ranks represent a SEQUENCE of cases, the so-called "Test for Runs Up and Down" can be used to test for a TREND, i.e., a tendency for scores to increase or decrease over a sequence. If data are NOT SEQUENCED and NOT RANKED, your best alternative is to categorize the data and to apply a test designed for "Partially Ordered" data (One-Sample Kolmogorov-Smirnov Test) or Nominal data (Chi-Square Goodness-of-Fit Test). There is no Univariate test for UNSEQUENCED RANKS. *S-UNI-PART,A The Kolmogorov-Smirnov One-Sample Test is recommended for a Categorized Ordinal ("Partially Ordered") variable. It tests the Null Hypothesis that the random sample was drawn from a Population with some specified Proportion of cases in the various categories: you specify these Proportions based on theory or prior information about the Population. *S-UNI-INT,A Use the One-Sample t-Test to determine whether it is likely that your sample was DRAWN FROM A POPULATION WITH A KNOWN (or guessed) MEAN, which you specify in the Null Hypothesis. Besides requiring INTERVAL MEASUREMENT, valid application of this test assumes the sample was drawn from a NORMALLY DISTRIBUTED POPULATION. Check to see that your data adequately meet these assumptions: most intro. texts explain conditions under which they may be relaxed. If you're interested in estimating the MEAN of the POPULATION from which your RANDOM SAMPLE was drawn, compute CONFIDENCE LIMITS FOR THE MEAN. If you're interested in the SHAPE of your variable's distribution, use the Chi-Square Goodness-of-Fit Test to see if it's likely that your SAMPLE was drawn from a POPULATION with an hypothesized proportion of cases in its various categories. You specify the Population Proportions (P) in the NULL Hypothesis and multiply each P by Sample N to get EXPECTED FREQUENCIES for the test. Within limits, you may hypothesize any set of P's derived from theory or prior knowledge of a population. If you get the P's from a table of the Normal Distribution, you can use the Chi-Square Goodness-of-Fit Test to see whether it's likely that your sample came from a NORMALLY DISTRIBUTED POPULATION. *S-2SAMPLE-INT,A Use Student's t-Test to compare TWO SUB-SAMPLE MEANS on an INTERVAL DEPENDENT VARIABLE, where RANDOM SAMPLING or RANDOM ASSIGNMENT of cases has yielded INDEPENDENT SUB-SAMPLES. Valid application of this test assumes: 1) that sub-samples were drawn from two NORMALLY DISTRIBUTED POPULATIONS, & 2) that the two parent POPULATIONS have EQUAL VARIANCES. Check to see that your data approximate these assumptions: most intro. texts list conditions under which these assumptions may be relaxed. A special form of the t-test is available in cases where population variances are unequal. *S-2MATCH-INT,A Use the Matched-Pairs t-Test to compare TWO SUB-SAMPLE MEANS on an INTERVAL DEPENDENT VARIABLE, where RANDOM SAMPLING or RANDOM ASSIGNMENT has yielded MATCHED (dependent) SUB-SAMPLES. Valid application of this test assumes that sub-samples were drawn from 2 NORMALLY DISTRIBUTED POPULATIONS. Check to see that your data approximate this assumption: most intro. texts list conditions under which it may be relaxed. *ARCSINE,A A number of tests are available for comparing 2 dichotomous sub-samples, in cases where RANDOM SAMPLING OR RANDOM ASSIGNMENT has yielded INDEPENDENT SUB-SAMPLES. (They are listed in order of preference.) The Arcsine Test is the preferred alternative, especially if sample size is small. A Chi-Square Contingency Test, with data cast in a 2-by-2 table, gives similar results when sample size is large. For smaller samples, Fisher's Exact may be used. Special forms of the z-test and t-test, which test for DIFFERENCES IN PRO- PORTIONS, are also applicable. Consult a statistics text for the assump- tions underlying each of these tests. *FISHER-EXACT,A Fisher's Exact Test is usually the best alternative for detecting a difference between INDEPENDENT SUB-SAMPLES when sample size is very small and data can be cast in a 2-by-2 contingency table. Fisher's Exact Test is also used as an alternative to the Chi-Square Contingency Test when sample size is too small to apply the latter: in such cases it is used to test for the significance of an ASSOCIATION BETWEEN 2 DICHOTOMOUS NOMINAL VARIABLES. Although not widely-known, Fisher's Exact Test can be extended to tables larger than a 2-by-2: the only problem is finding a computer program that calculates p-values for larger tables. *MCNEMAR,A The McNemar Test is designed to compare a DICHOTOMOUS DEPENDENT VARIABLE across 2 MATCHED SUB-SAMPLES. The Dependent variable may be inherently dichotomous or transformed to a dichotomy especially for the test. There is NO TEST designed to compare a Dependent variable with 3 or more categories across Matched Sub-Samples. The McNemar Test assumes only Nominal Measurement, but if an Ordinal Dependent variable is dichotomized at the Overall Median, it can be used as a test for differences between Medians for MATCHED SAMPLES. *MEDIAN-TEST,A The Median Test is designed to compare 2 INDEPENDENT SUB-SAMPLES when the DEPENDENT VARIABLE is ORDINAL and when it is feasible to determine the OVERALL MEDIAN OF THE TOTAL SAMPLE. Although tests based on ranks are preferable, the Median Test is a good alternative when data are "Partially Ordered" or when sample size so large that it is infeasible to rank the data. The Median Test is really a "transformation" rather than a distinct test: data are cast in a 2-by-2 contingency table by breaking the Dependent vari- able at the overall Median; then either the Chi-Square Contingency Test or Fisher's Exact Test is applied, depending on sample size. The Median Test can also be applied when there are 3 or More INDEPENDENT SUB-SAMPLES. In this case, the Dependent variable is again Dichotomized at the OVERALL MEDIAN, but data are cast in a 2-by-k contingency table, where k is the number of sub-samples. Then the Chi-Square Contingency Test is applied. *WILCOX-MATCH,A The appropriate test for a difference between TWO MATCHED SUB-SAMPLES, when the ORDINAL DEPENDENT VARIABLE is scored a RANKS, is the Wilcoxon Matched-Pairs Test [sometimes called the Matched-Pairs Signed-Ranks Test]. *WILCOX-RSUM,A Two tests, the Wilcoxon Rank-Sum Test and the Mann-Whitney U-Test, can be applied to test for a difference between TWO INDEPENDENT SUB-SAMPLES, when the ORDINAL DEPENDENT VARIABLE is scored as RANKS. These are really two forms of the same test and yield exactly the same p-values. Although the Mann-Whitney is more widely used, the Wilcoxon Rank-Sum Test is much easier to compute and interpret and, therefore, preferable. [Don't confuse this Rank-Sum Test with Wilcoxon's Matched-Pairs Test, which is used for DEPENDENT SUB-SAMPLES.] *ONEWAY,A The appropriate significance test for differences between Means of three or more INDEPENDENT SUB-SAMPLES is the so-called "ONE-WAY ANOVA F-TEST." This is an "overall" test: it detects differences between pairs or combina- tions of sub-samples, but it can't specify which sub-samples differ. Thus, it must be followed by more specific tests, called CONTRASTS, to pinpoint which sub-samples differ. Besides assuming INDEPENDENT SUB-SAMPLES and INTERVAL MEASUREMENT, this F-Test assumes that sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have EQUAL VARIANCES. Check to see that your data approximate all these assumptions: most intro. texts specify conditions under which they may be relaxed. Consult a specialized text on Analysis of Variance (ANOVA) for help in selecting a test for CONTRASTS following the overall F-Test. [Usually, the Duncan Multiple-Range Test is best for Contrasts between PAIRS of sub-samples and the Scheffe Test best for Contrasts between GROUPS of sub-samples, but there are many other alter- natives that may be preferable in your case.] *TWOWAY,A The best significance test for differences between Means of 3 or more MATCHED SUB-SAMPLES is ANALYSIS OF VARIANCE F-TEST FOR RANDOMIZED BLOCKS, which is sometimes loosely called "TWO-WAY" ANOVA. In this design, "Blocks" may be individual cases or sets of matched cases, which are represented in all the sub-samples. Blocks are used to "control" extraneous between-case variation. When individual cases appear in all the sub-samples, the design is referred to as a RANDOMIZED BLOCKS DESIGN WITH REPEATED MEASURES. The F-Test is an "overall" test: it detects differences between pairs or combinations of sub-samples, but it can't specify which sub-samples differ. Thus, it must be followed by more specific tests, called CONTRASTS, to pin- point which sub-samples differ. Besides assuming INTERVAL MEASUREMENT, this F-Test assumes that sub-samples were drawn from NORMALLY DISTRIBUTED POPULA- TIONS that have EQUAL VARIANCES. Check to see that your data approximate all these assumptions. Specialized texts on Analysis of Variance (ANOVA) usually contain extensive explanations of underlying assumptions and also offer help in selecting a test for CONTRASTS following the overall F-Test. *CR-FACTORIAL,A ANALYSIS OF VARIANCE with a COMPLETELY RANDOMIZED FACTORIAL (CRF) design is the best alternative when you have: an 1) INTERVAL DEPENDENT VARIABLE, 2) TWO OR MORE COMPARISON VARIABLES, and 3) NO MATCHING of cases across sub-samples of any Comparison Variable. [The last condition implies that each case appears in the analysis one and only one time.] The CRF design yields an F-Test for each Comparison Variable and also for INTERACTION EFFECTS due to sets of these variables. The F-Tests are "overall" tests: they detect differences between pairs or combinations of sub-samples, but don't specify which sub-samples differ. Thus, they must be followed by more specific tests, called CONTRASTS, to pinpoint which sub-samples differ. Besides INTERVAL MEASUREMENT, the F-Tests assume that the sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have EQUAL VARIANCES. Check to see that your data approximate all these assump- tions. Specialized texts on Analysis of Variance usually contain extensive explanations of underlying assumptions and the conditions under which they may be relaxed. Only a few offer help in selecting the most appropriate test for CONTRASTS in CRF Designs. *RB-FACTORIAL,A ANALYSIS OF VARIANCE with a RANDOMIZED BLOCKS FACTORIAL (RBF) design is the best alternative if you have: an 1) INTERVAL DEPENDENT VARIABLE, 2) TWO OR MORE COMPARISON VARIABLES, and 3) MATCHED CASES or OBSERVATIONS across sub-samples of one or more Comparison Variables. In this design, "Blocks" may be individual cases or sets of matched cases, which are represented in all the sub-samples of a Comparison Variable. Blocks are used to "control" extraneous between-case variation. When individual cases appear in all the sub-samples of any Comparison Variable, the design is referred to as a RANDOMIZED BLOCKS FACTORIAL DESIGN WITH REPEATED MEASURES. When the Blocks are split into "Sub-Blocks" on one or more "Blocking Variables" the design is referred to as a SPLIT-PLOT DESIGN. The RBF design yields an F-Test for each Comparison Variable and also for INTERACTION EFFECTS due to sets of these variables. The F-Tests are "overall" tests: they detect differences between pairs or combinations of sub-samples, but don't specify which sub-samples differ. Thus, they must be followed by more specific tests, called CONTRASTS, to pinpoint which of the sub-samples differ. Besides INTERVAL MEASUREMENT, the F-Tests assume that sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have EQUAL VARIANCES. Check to see that your data approximate all these assump- tions. Specialized texts on Analysis of Variance usually contain extensive explanations of underlying assumptions and the conditions under which they may be relaxed. Only a few offer help in selecting the most appropriate test for CONTRASTS in RBF or Split-Plot Designs. *ANOVA/REGN,A [Traditional ANOVA computations for the above design require EQUAL FREQUEN- CIES in all the cells created when the sample is split by 2 or more Compar- ison Variables. If cell frequencies are unequal, F-Ratios can be obtained through Multiple Regression procedures, of which ANOVA is a special case. Most computer programs use Multiple Regression for all ANOVA problems, but hide this fact by reporting results in a conventional ANOVA Summary Table.] *ANCOVA,A If you have one or more Independent variables that you wish to "control" or "adjust for" without building them in as Comparison Variables, you can apply ANALYSIS OF COVARIANCE (ANCOVA) procedures. ANCOVA is an extension of ANOVA in which the effects of one or more INTERVAL-LEVEL INDEPENDENT VARI- ABLES are "partialled out," through Multiple Regression procedures, before F-Ratios are computed for the major Comparison Variables. Normally, vari- ables are selected for such adjustment because they create "extraneous" variation in the Dependent Variable and can't be eliminated physically. ANCOVA usually requires a computer and most popular statistical packages can perform it. To use ANCOVA, you must meet all the assumptions of ANOVA and Multiple Regression, plus some additional ones unique to this procedure. Specialized texts on Analysis of Variance usually explain all these assump- tions and the conditions under which they may be relaxed. *MANOVA,A MULTIVARIATE ANALYSIS OF VARIANCE (MANOVA) is an extension of ANOVA designed to handle two or more INTERVAL-LEVEL DEPENDENT VARIABLES simulta- neously. The application of MANOVA and the interpretation of its results requires advanced statistical training. If you lack such expertise, and if your theory demands MANOVA, it would be wise to seek help from a statistical consultant before attempting to apply it. It may be wiser yet to choose a procedure that can be applied in separate analyses for each Dependent vari- able. If the latter alternative is feasible, WATSTAT may be able to offer more help: return to the Choice Boxes and select "Multivariate with ONE Dependent Variable" in Box 4. *CHI-LOGIST,A Significance tests associated with Logistic Regression PARALLEL those used with Linear Multiple Regression: there are tests for overall fit of the equation as well as for individual Regression Coefficients. However, as Logistic Regression is based on a different equation-fitting criterion, neither the tests nor their interpretations are IDENTICAL to their Linear counterparts. Logistic Regression also has its own set of assumptions and limitations, which you'll need to consider. *CHI-COMP-NOM,A Use the Chi-Square Contingency Test to determine whether it is likely that your RANDOM SAMPLE was drawn from a set of Sub-Populations (correspond- ing to your Sub-Samples) that have the same proportion of cases in the various categories of the Dependent Variable. [Chi-Square must be computed on RAW FREQUENCIES: don't make the common beginner's error of computing it from a table of Percentages or Proportions.] *CHI-PHI,A The appropriate significance test for the Phi Coefficient or Cramer's V is the Chi-square Contingency Test. Fisher's Exact Test may be used as a test for Phi if sample size is too small for the Chi-Square Test. *TTEST-BIV-R,A A special t-Test or F-Test is used to test for the significance of the Correlation Coefficient (r) or the Regression Coefficient (b). In the bi- variate case, t and F Tests yield exactly the same p-values and tests for r and b are equivalent. Besides requiring INTERVAL MEASUREMENT, these tests assume BIVARIATE NORMALITY. Check to see that your data approximate this assumption: most intro. texts list conditions under which it may be relaxed. *TTEST-RHO,A A special t-Test is used to test for the significance of Spearman's Rho. The computing formula for this test is the same as that used for the Linear Correlation Coefficient (r) except that Rho replaces r in the computations. *ZTEST-TAU,A The significance test for Kendall's Tau uses a z-statistic, which is referred to a table of the Standard Normal Distribution to obtain p-values. For sample sizes less than 10, exact tables are available and should be used instead of the Normal approximation. *FTEST-ETA,A The significance test used for the Correlation Ratio (Eta) is the F-Test obtained from a ONE-WAY ANALYSIS OF VARIANCE. *FTEST-MULTR,A An F-Test is used to test for the significance of the Multiple Correla- tion Coefficient. A special t-Test or F-Test (yielding identical p-values) is used to test the significance of each Regression Coefficient in the equa- tion. F-Tests for "R-Square Change" can be used to test whether a set of two or more Independent Variables contributes significantly to the fit of equation. Valid application of these tests rests on many stringent assump- tions: consult a Multiple Regression/Correlation text for information about these assumptions and check to see that your data meet them. *S-LOG-LIN,A Several significance tests are usually applied in a Log-Linear Analysis, all of which are referred to the Chi-Square Distribution to obtain p-values. In addition to a test for overall fit of a Log-Linear Model (analogous to a test for R-Squared in Regression), tests are usually made for MAIN EFFECTS and INTERACTION EFFECTS (analogous to F-Tests in Analysis of Variance). *S-DISCRIM,A Several F-Tests are usually applied in a Discriminant Analysis, includ- ing: a test for fit of each discriminant function, tests for the contribu- tion of each Discriminant Function Coefficient, and tests for differences between groups. Computer programs also use significance tests as criteria for including variables and for terminating the analysis. [The validity of these criteria, like ALL significance tests, rests on the assumption of Random Sampling.] *S-FACTOR-ANAL,A Numerous tests can be applied in Factor Analysis, including tests for Factor Loadings, Correlations between Factors, and the Number of Factors. When the focus is on description, as it is in so-called "Exploratory Factor Analysis," there is usually no need for any tests. However, significance tests become central when the Factor Analysis is used to address theoretical hypotheses, as in "Confirmatory Factor Analysis." *S-KENDALL-W,A The significance test for Kendall's W uses exact tables when sample size and the number of variables are small. Otherwise, a Chi-Square stat- istic is used. The Null Hypothesis tested is that the sample was drawn from a population in which the variables are mutually Independent. *S-COCHRANQ,A Cochran's Q Test is designed to compare a DICHOTOMOUS DEPENDENT VARIABLE across 3 or more MATCHED SUB-SAMPLES. The Dependent variable may be inher- ently dichotomous or transformed to a dichotomy especially for the Q-test. There is NO TEST designed to compare a Dependent variable with 3 or more categories across Matched Sub-Samples. Cochran's Q Test assumes only Nominal Measurement, but if an Ordinal Dependent variable is dichotomized at the OVERALL MEDIAN, it can be used to test the Null Hypothesis that Matched Sub-Samples were RANDOMLY drawn from Populations with the same Median. *KRUSKAL,A The Kruskal-Wallis Test is designed to compare an ORDINAL DEPENDENT VARIABLE across 3 or more INDEPENDENT SUB-SAMPLES. If the Dependent vari- able is not inherently Ranked it must be transformed to Ranks for the test. The Kruskal-Wallis is an analogue of One-Way ANOVA and uses a Chi-Square test statistic in place of the ANOVA F-Test. *FRIEDMAN,A The Friedman Test is designed to compare an ORDINAL DEPENDENT VARIABLE across 3 or more MATCHED SUB-SAMPLES. If the Dependent variable is not inherently Ranked it must be transformed to Ranks for the test. This test is an analogue of "Two-Way ANOVA" (Randomized Blocks ANOVA) and uses a Chi-Square test statistic in place of the ANOVA F-Test. *S-COMP2-RANK,A There is no well-known significance test for Ordinal data that can handle 2 or more Independent (Comparison) Variables in a single analysis. That is, there are no Ordinal-Level analogues to Factorial ANOVA, Analysis of Covariance, etc., which are used with Interval Dependent Variables. *S-COMP2-DICH,A There is no test designed to compare a DICHOTOMOUS DEPENDENT VARIABLE across SUB-SAMPLES created by 2 or more Independent (Comparison) variables. However, if it's appropriate to shift the Analytical Focus from "Sub-Sample Comparison" to "Association," a number of alternatives are open. Among these are Logistic Regression and Discriminant Analysis. If your Analytical Focus can be changed in this way -- if it MAKES SENSE to cast your research questions in terms of Association -- return to WATSTAT's Choice Boxes and select "No Sub-Sample Comparisons" in Box 2 and "Describe Association" in Box 3. WATSTAT's Report will then give you more information about Logistic Regression and Discriminant Analysis. *S-COMP2-NOM-IND,A There is no test designed to compare a NOMINAL DEPENDENT VARIABLE across SUB-SAMPLES created by 2 or more Independent (Comparison) variables. If it's appropriate to change your Analytical Focus from "Sub-Sample Comparison" to "Association," a number of alternatives are open, namely, Log-Linear Analysis, Logistic Regression, and Discriminant Analysis. If it MAKES SENSE to re-cast your research questions in terms of Association, return to WATSTAT's Choice Boxes and select "No Sub-Sample Comparisons" in Box 2 and "Describe Association" in Box 3. WATSTAT's Report will then give you more information about the above alternatives. [All these alternatives require advanced statistical training: a wise novice will seek expert help.] *S-COMP2-NOM-MATCH,A There is NO MULTIVARIATE TEST designed to compare a NOMINAL DEPENDENT VARIABLE across MATCHED SUB-SAMPLES created by 2 or more Comparison vari- ables. If you haven't yet collected the data, consider ways to achieve an Interval-Level measure of the Dependent variable. If the data are already collected, and if it's appropriate and feasible to dichotomize the Dependent variable, you may be able to use ANOVA F-Tests. [This will also require a so-called ARCSINE TRANSFORMATION before ANOVA can be applied to a Dichotomous Dependent variable.] If either of these options is viable in your case, return to WATSTAT's Choice Boxes and select "Interval" in Box 5. *COPYRIGHT,A COPYRIGHT 1991 BY HAWKEYE SOFTWORKS, 300 GOLFVIEW AVE., IOWA CITY, IA, 52246